54 research outputs found
MCMC Learning
The theory of learning under the uniform distribution is rich and deep, with
connections to cryptography, computational complexity, and the analysis of
boolean functions to name a few areas. This theory however is very limited due
to the fact that the uniform distribution and the corresponding Fourier basis
are rarely encountered as a statistical model.
A family of distributions that vastly generalizes the uniform distribution on
the Boolean cube is that of distributions represented by Markov Random Fields
(MRF). Markov Random Fields are one of the main tools for modeling high
dimensional data in many areas of statistics and machine learning.
In this paper we initiate the investigation of extending central ideas,
methods and algorithms from the theory of learning under the uniform
distribution to the setup of learning concepts given examples from MRF
distributions. In particular, our results establish a novel connection between
properties of MCMC sampling of MRFs and learning under the MRF distribution.Comment: 28 pages, 1 figur
Learning using Local Membership Queries
We introduce a new model of membership query (MQ) learning, where the
learning algorithm is restricted to query points that are \emph{close} to
random examples drawn from the underlying distribution. The learning model is
intermediate between the PAC model (Valiant, 1984) and the PAC+MQ model (where
the queries are allowed to be arbitrary points).
Membership query algorithms are not popular among machine learning
practitioners. Apart from the obvious difficulty of adaptively querying
labelers, it has also been observed that querying \emph{unnatural} points leads
to increased noise from human labelers (Lang and Baum, 1992). This motivates
our study of learning algorithms that make queries that are close to examples
generated from the data distribution.
We restrict our attention to functions defined on the -dimensional Boolean
hypercube and say that a membership query is local if its Hamming distance from
some example in the (random) training data is at most . We show the
following results in this model:
(i) The class of sparse polynomials (with coefficients in R) over
is polynomial time learnable under a large class of \emph{locally smooth}
distributions using -local queries. This class also includes the
class of -depth decision trees.
(ii) The class of polynomial-sized decision trees is polynomial time
learnable under product distributions using -local queries.
(iii) The class of polynomial size DNF formulas is learnable under the
uniform distribution using -local queries in time
.
(iv) In addition we prove a number of results relating the proposed model to
the traditional PAC model and the PAC+MQ model
Stable Matching with Evolving Preferences
We consider the problem of stable matching with dynamic preference lists. At
each time step, the preference list of some player may change by swapping
random adjacent members. The goal of a central agency (algorithm) is to
maintain an approximately stable matching (in terms of number of blocking
pairs) at all times. The changes in the preference lists are not reported to
the algorithm, but must instead be probed explicitly by the algorithm. We
design an algorithm that in expectation and with high probability maintains a
matching that has at most blocking pairs.Comment: 13 page
Learning DNFs under product distributions via {\mu}-biased quantum Fourier sampling
We show that DNF formulae can be quantum PAC-learned in polynomial time under
product distributions using a quantum example oracle. The best classical
algorithm (without access to membership queries) runs in superpolynomial time.
Our result extends the work by Bshouty and Jackson (1998) that proved that DNF
formulae are efficiently learnable under the uniform distribution using a
quantum example oracle. Our proof is based on a new quantum algorithm that
efficiently samples the coefficients of a {\mu}-biased Fourier transform.Comment: 17 pages; v3 based on journal version; minor corrections and
clarification
Global and Local Information in Clustering Labeled Block Models
The stochastic block model is a classical cluster-exhibiting random graph
model that has been widely studied in statistics, physics and computer science.
In its simplest form, the model is a random graph with two equal-sized
clusters, with intra-cluster edge probability p, and inter-cluster edge
probability q. We focus on the sparse case, i.e., p, q = O(1/n), which is
practically more relevant and also mathematically more challenging. A
conjecture of Decelle, Krzakala, Moore and Zdeborova, based on ideas from
statistical physics, predicted a specific threshold for clustering. The
negative direction of the conjecture was proved by Mossel, Neeman and Sly
(2012), and more recently the positive direction was proven independently by
Massoulie and Mossel, Neeman, and Sly.
In many real network clustering problems, nodes contain information as well.
We study the interplay between node and network information in clustering by
studying a labeled block model, where in addition to the edge information, the
true cluster labels of a small fraction of the nodes are revealed. In the case
of two clusters, we show that below the threshold, a small amount of node
information does not affect recovery. On the other hand, we show that for any
small amount of information efficient local clustering is achievable as long as
the number of clusters is sufficiently large (as a function of the amount of
revealed information).Comment: 24 pages, 2 figures. A short abstract describing these results will
appear in proceedings of RANDOM 201
Decentralized Cooperative Stochastic Bandits
We study a decentralized cooperative stochastic multi-armed bandit problem
with arms on a network of agents. In our model, the reward distribution
of each arm is the same for each agent and rewards are drawn independently
across agents and time steps. In each round, each agent chooses an arm to play
and subsequently sends a message to her neighbors. The goal is to minimize the
overall regret of the entire network. We design a fully decentralized algorithm
that uses an accelerated consensus procedure to compute (delayed) estimates of
the average of rewards obtained by all the agents for each arm, and then uses
an upper confidence bound (UCB) algorithm that accounts for the delay and error
of the estimates. We analyze the regret of our algorithm and also provide a
lower bound. The regret is bounded by the optimal centralized regret plus a
natural and simple term depending on the spectral gap of the communication
matrix. Our algorithm is simpler to analyze than those proposed in prior work
and it achieves better regret bounds, while requiring less information about
the underlying network. It also performs better empirically
Stable Matching with Evolving Preferences
We consider the problem of stable matching with dynamic preference lists. At each time-step, the preference list of some player may change by swapping random adjacent members. The goal of a central agency (algorithm) is to maintain an approximately stable matching, in terms of number of blocking pairs, at all time-steps. The changes in the preference lists are not reported to the algorithm, but must instead be probed explicitly. We design an algorithm that in expectation and with high probability maintains a matching that has at most O((log n)^2 blocking pairs
Evolution with Drifting Targets
We consider the question of the stability of evolutionary algorithms to
gradual changes, or drift, in the target concept. We define an algorithm to be
resistant to drift if, for some inverse polynomial drift rate in the target
function, it converges to accuracy 1 -- \epsilon , with polynomial resources,
and then stays within that accuracy indefinitely, except with probability
\epsilon , at any one time. We show that every evolution algorithm, in the
sense of Valiant (2007; 2009), can be converted using the Correlational Query
technique of Feldman (2008), into such a drift resistant algorithm. For certain
evolutionary algorithms, such as for Boolean conjunctions, we give bounds on
the rates of drift that they can resist. We develop some new evolution
algorithms that are resistant to significant drift. In particular, we give an
algorithm for evolving linear separators over the spherically symmetric
distribution that is resistant to a drift rate of O(\epsilon /n), and another
algorithm over the more general product normal distributions that resists a
smaller drift rate.
The above translation result can be also interpreted as one on the robustness
of the notion of evolvability itself under changes of definition. As a second
result in that direction we show that every evolution algorithm can be
converted to a quasi-monotonic one that can evolve from any starting point
without the performance ever dipping significantly below that of the starting
point. This permits the somewhat unnatural feature of arbitrary performance
degradations to be removed from several known robustness translations
- …